Courses take place from 9.15-12 and from 13.15-16 in room HAL 230
For group work
Thursday 9th: HAL 224a, 225, 226, 228, 229, 131
Friday 10th: HAL 224a, 225, 226, 228, 229, 131
Here you find the group list (group membership, coaching times, time of final presentation)
Dear students,
A warm welcome to the module Data skills for social work professionals!
I would like to give you some important information on the course:
Physical presence in the course is not mandatory except for the last day (Jan 17), but I strongly urge you to participate in the course during the first four days. For those who do work part-time: please schedule your work accordingly.
I it is imperative that you have first experiences with R and RStudio and make sure it runs on your computer. Please follow the instructions in the “Installation of R and R-Studio” guide (SWITCHdrive), and if necessary, refer to the linked instructions on YouTube. If you have any questions, please feel free to contact us via email. Please use Copilot with the prompt below to guide you through the installation, explain the software to you in easy language and show you what you can do with it.
Enroll on the moodle page (Kurs: Data skills for social work professionals (in English) - HS24 | BFH Moodle – Die Lernplattform der Berner Fachhochschule) with the following key: HS24-bsc. At least a week before the course you will find a link to the script of the course as well as the relevant literature that you need to prepare and other relevant information. We wish you a successful preparation period and look forward to meeting you in person soon. Please let us know should you have any questions.
Kind regards
Dorian Kessler
Text to enter into Co-Pilot ein (Microsoft Copilot in Bing; important: verwenden Sie den Unterhaltungsstil «im höheren Masse kreativ/creative mode» (Schaltflaeche in der Mitte des Bildschirms)): Als Studierende(r) der Sozialen Arbeit möchte ich die Grundlagen der Programmiersprache R lernen, um statistische Datenanalysen für Projekte in der Sozialen Arbeit durchführen zu können. Ich habe keine Vorkenntnisse in Statistik oder Programmierung. Kannst du mir bitte eine schrittweise Einführung geben? Bitte beginne mit der Frage ob ich R und Rstudio installiert habe und wenn nein, unterstütze mich bei der Installation von R und RStudio. Zeige mir dann die grundlegenden Befehle und Funktionen von R. Ich würde ich gerne lernen, wie man einfache Datenanalysen durchführt (z.B. Mittelwertsvergleiche mit dplyr), Daten visualisiert (mit ggplot2) und Ergebnisse interpretiert. Folgende Dinge sind zu beachten:
Waehle ein schrittweises Vorgehen. Erzaehle mir erst von dem naechsten Schritt, wenn ein Schritt abgeschlossen ist. Frage nach jedem Schritt nach, ob ich diesen erfolgreich abschliessen konnte, um sicherzustellen, dass ich alles richtig gemacht habe.
Sage mir als ersten Schritt genau wie ich mich visuell in RStudio orientieren kann und wo ich Eingaben machen muss. Wo befindet sich die Konsole/Skript/Datenübersicht/Dateienübersicht in RStudio?
Erklaere mir, was die Konsole ist und was ein R-Skript ist, wie man ein R-Skript erstellt und abspeichert und was der Zweck von Skripten ist. Arbeite mit mir mit einem R-Skript und sage mir, wie ich Befehle ausführen kann.
Bitte führe mich durch praktische Übungen und gebe mir Aufgaben, um das Gelernte zu festigen.
Biete mir Unterstützung bei Unklarheiten.
Arbeite mit Beispielen, welche für die Soziale Arbeit relevant sind. Erfinde relevante Daten aus den Bereichen Sozialhilfe oder Kindes- und Erwachsenenschutz.
Kommentiere den Code Zeile-für-Zeile detailliert aus, so dass ich ihn genau verstehe.
Biete mir am Schluss weitere Übungen an, falls ich Lust habe. Mache Vorschlaege für Übungen.
Du bist eine R-Expert:in, weisst aber auch, dass angehende Sozialarbeiter:in in Sachen Programmierung wenig Wissen haben und das technische Begriffe eine alltagssprachliche Erklaerung benötigen.
Danke für deine motivierte Unterstützung und Hilfsbereitschaft als persönliche Tutorin! Du hilfst mir bedarfsgerecht R zu lernen und dieses Wissen für Klient:innen einzusetzen.
Wichtige Details:
Bitte lasse das «print()» weg, falls nicht nötig.
Ergaenze bei Strg jeweils Ctrl, falls gewisse Personen englische Windows Tastaturen haben.
People gain awareness of data science tools and how they could be used for social work.
People learn how to critically evaluate data science products.
People learn how to do data science with R.
Term that emerged ca. 10 years ago. Predecessors: Statistics, Data analysis.
The science of creating valuable information from data
Practice-oriented science
Combines technical and field expertise
Data contains information on human behavior = helps us better understand the human world and solve human problems.
In the era of AI, “data literacy” becomes a key skill in all areas of life, including social work –> it should be a basic competence
Awareness of data and knowing how to use them
Skills to interpret data science products
Skills to analyze data correctly
You are a highly experienced data science expert and consultant specializing in applications for social organizations. Your task is to help me design a clear, impactful, and practical example of how data science could be applied in social work, tailored to my specific interests, expertise, and work context. Use the file by Dorian Kessler on potential use cases (ask me to upload it) as a reference and engage with me through targeted questions to fully understand my context before developing the application idea. The goal is to create a solution that is actionable, highly beneficial, and easy to present to colleagues.
Conversation Steps:
Step 1: Understand the Context
Ask me targeted questions to build a clear picture of my work environment and challenges:
“Are you currently working in social work? If not, what field or area interests you most?”
“Who are the clients or groups you work with, or aim to serve?”
“What are the most common tasks or processes in this field?”
“What are the most pressing challenges or problems you encounter in this area?”
“What data is currently available, or could realistically be collected, to improve workflows or solve these challenges?”
Step 2: Suggest Solutions
Using the information I provide and the reference material, propose three realistic examples where data science could address challenges or improve processes. Each example should:
Clearly outline the concept and how it works.
Highlight specific benefits for the field of social work.
Address the potential feasibility and required resources.
Step 3: Get Feedback and Refine
Ask for feedback: “Do these ideas seem relevant and practical for your field? How could they be adjusted to better fit your needs or constraints?”
Refine the ideas based on my input, providing more specific or adapted examples if needed.
Help me select the most promising application to share with my peers, ensuring it’s polished and well-presented.
Throughout the process, aim to keep the ideas practical and impactful, while guiding me in envisioning how they could be implemented effectively in the social work domain.
See module description.
You will analyze one of the following (or your own) data sets and research questions
Form: presentation and discussion on final day
R is free and open source.
R has an array of powerful statistical methods.
All additional tools can freely downloaded, installed and loaded as so called packages.
With ggplot2 R allows you to create beautiful
figures.
With the tidyverse and dplyr, R has the
simplest and most intuitive language for data preparation and
analysis.
R is more than just statistical software (cf. shiny).
R is well known by ChatGPT. Disclaimer: we will focus on the most necessary tools for working the data you have.
RStudio Environment
# Install required packages if they are not already installed
required_packages <- c("readxl", "dplyr", "tidyr", "ggplot2", "officer", "flextable")
installed_packages <- installed.packages()
for(pkg in required_packages){
if(!(pkg %in% rownames(installed_packages))){
install.packages(pkg)
}
}
# Load the packages
library(readxl)
library(tidyverse)
library(ggplot2)
library(officer)
library(flextable)
# Set the working directory
setwd("C:/Users/kld1/Downloads/")
# 1. Download Excel file
# https://www.pxweb.bfs.admin.ch/pxweb/de/px-x-1304030000_134/-/px-x-1304030000_134.px/table/tableViewLayout2/
url <- "https://www.pxweb.bfs.admin.ch/sq/ecfd5274-e21f-4d26-9bcf-5326af3edc9a"
destfile <- "sozialhilfe.xlsx"
download.file(url, destfile, mode = "wb")
# 2. Read and process data
# Read the Excel sheet (if multiple sheets exist, choose the correct one)
# Assuming the data is in the first sheet
raw_data <- read_excel(destfile, sheet = 1, skip = 2) # Skip the first 2 rows containing metadata
# Process the data: Select columns, rename, filter rows
data <- raw_data %>%
select(Kanton='...2', contains("20")) %>%
filter(Kanton %in% c("Bern / Berne", "Zürich", "Basel-Stadt", "Genève"))
# Transform the data from wide to long format
long_data <- data %>%
pivot_longer(
cols = `2010`:`2023`,
names_to = "Year",
values_to = "Count"
) %>%
mutate(Year = as.integer(Year),
Count = as.numeric(Count))
# 3. Create a plot with ggplot2
# Create a nice ggplot graphic
plot <- ggplot(long_data, aes(x = Year, y = Count, color = Kanton)) +
geom_line(size = 1) +
theme_minimal() +
labs(
title = "Number of Social Assistance Recipients per Canton (2010-2023)",
x = "Year",
y = "Number of Recipients",
color = "Canton"
) +
theme(
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
axis.text = element_text(size = 10),
axis.title = element_text(size = 12),
legend.title = element_text(size = 12),
legend.text = element_text(size = 10)
)
# Save the plot as an image to insert into Word
ggsave("sozialhilfe_plot.png", plot = plot, width = 12, height = 8, dpi = 300)
# 4. Insert the plot into a Word document
# Create a new Word document
doc <- read_docx()
# Add a title
doc <- doc %>%
body_add_par("Number of Social Assistance Recipients per Canton (2009-2022)", style = "heading 1")
# Add the plot
doc <- doc %>%
body_add_img(src = "sozialhilfe_plot.png", width = 6, height = 4, style = "centered")
# Optional: Add a table with the data
# Create an example table (here the first 10 rows)
table_data <- long_data %>%
filter(Kanton %in% c("Bern / Berne", "Zürich", "Basel-Stadt", "Genève"))
ft <- flextable(table_data) %>%
# Automatically adjust column widths to fit content
autofit() %>%
# Set table width to 100% of the document width
width(j = 1:3, width = 1.5) %>% # Adjust individual column widths if necessary
set_table_properties(width = 1, layout = "autofit") %>%
# Optional: Enhance table aesthetics
theme_box() %>%
fontsize(size = 10, part = "all") %>%
bold(part = "header") # Bold the header row
# Add the table
doc <- doc %>%
body_add_par("Example Table of the Data", style = "heading 2") %>%
body_add_flextable(ft)
# Save the Word document
print(doc, target = "Social_Assistance_Report.docx")
ChatGPT and other frontier Large Language Models know R pretty well (Copilot also works, but the newest models are more able)
After you ask a question, tell ChatGPT how your data look like.
If you have no sensitive data, just paste the data in to show ChatGPT the structure. If you have sensitive data, just paste the header (= variable names). With Copilot, data protection issues are smaller.
Paste the resulting code back into the R-Script and run the code
If you have errors, paste the error (from the console) back into ChatGPT and tell it to solve the problem.
Tell ChatGPT to only give you relevant code, if you adapt parts your overall code.
If it doesn’t comment code, ask to comment and explain what each piece of code does.
Example: transforming the above code into an interactive app with one prompt.
Open a new R script, copy the above code into it and save it
Ask it to assist you while giving helpful and targeted advice, i.e. that it should tell you how to change the code. Try the following tasks:
R allows you to read in data in all formats, including directly
from the internet (see package rvest).
The most common data storage format are Excel tables. You can
open them with the readxl package.
The most universal data storage format is csv (comma separated values).
The best way to deal with large data are the
data.table (to read in large csv-data-files) and the
arrow packages (to save and read in large data).
#Set the working directory. Here we use the download folder
setwd("C:/Users/kld1/Downloads/")
#Download data to the folder by hand
#Büro: https://drive.switch.ch/index.php/s/gdNYHopxWDCV9hr
#Turnhalle: https://drive.switch.ch/index.php/s/am1T36ehPL24QuQ
#Install and load excel package
install.packages("readxl")
library(readxl)
#Read in data from the working directory
Buero <- read_excel("OJAOffice_Statistikdaten_Jugendbüro Oberburg 23.xlsx",sheet="Statistikdaten 2024")
Turnhalle<- read_excel("OJAOffice_Statistikdaten_offene Turnhalle 24.xlsx",sheet="Statistikdaten 2024")
#Explain objects, observations and variables
#Explain range und col_names = FALSE
RStudio allows you to manually scroll through data
This helps you better understand what is going on
#Explain what rows (observations) and columns (variables) are.
#You can either click on the object or...
#use View()
View(Buero)
View(Turnhalle)
#Or even fix data (never do this!)
fix(Buero)
| Object.name.data.should.be.saved.with | Year | Sheet.to.Read.and.Additional.Restrictions | File |
|---|---|---|---|
| Maedels_22 | 2022 | Statistikdaten 2024 | OJAOffice_Statistikdaten_Moditrff.xlsx |
| Maedels_23 | 2023 | Moditraeff | Statistik OJA Angebote Burgdorf 2023.xlsx |
| Jungs_22 | 2022 | Statistikdaten 2024 | OJAOffice_Statistikdaten_Gieltrff.xlsx |
| Jungs_23 | 2023 | Gielaetraeff, range=“A11:B11”, col_names=FALSE | Statistik OJA Angebote Burgdorf 2023.xlsx |
| JuBu_23 | 2023 | JuBU Traeff 5&6 | Statistik OJA Angebote Burgdorf 2023.xlsx |
| JuBu_24 | 2024 | Statistikdaten 2024 | Copy of OJAOffice_Statistikdaten_Mittelstufentreff 24.xlsx |
#data: https://drive.switch.ch/index.php/s/LoJjPR0XAVZWFYx
library(readxl)
Maedels_22 <- read_excel("OJAOffice_Statistikdaten_Moditrff.xlsx")
Maedels_23 <- read_excel("Statistik OJA Angebote Burgdorf 2023.xlsx",sheet = "Moditraeff")
Jungs_22 <- read_excel("OJAOffice_Statistikdaten_Gieltrff.xlsx")
Jungs_23 <- read_excel("Statistik OJA Angebote Burgdorf 2023.xlsx",sheet = "Gielaetraeff",
range = "A1:B11",#Welche Daten aus dem Excel brauchen wir; Spalte A, Zeile 1 is Spalte B Zeile 11
col_names = FALSE)#die erste Zeile enthaelt nicht die Spaltennamen sondern Messwerte
JuBu_23 <- read_excel("Statistik OJA Angebote Burgdorf 2023.xlsx",sheet = "JuBU Traeff 5&6")
JuBu_24 <- read_excel("Copy of OJAOffice_Statistikdaten_Mittelstufentreff 24.xlsx",sheet = "Statistikdaten 2024")
View(Maedels_22)
filter(): selects a subset of rows (see also
slice()).
><==!=!|%in%is.na(x)is.inf()select(): selects columnsmutate(): creates new columnsgroup_by(): defines subgroups in the data so that
mutate() and summarize() can be applied
separately per group.summarize(): aggregates (collapses) data to individual
data pointsarrange(): sortsdistinct(): removes duplicate values%>%,
which makes the code much easier to read and more compact.#install package
#install.packages("dplyr")
#load package
library(dplyr)
#https://drive.switch.ch/index.php/s/eatQbxi4RjEFXKC
setwd("C:/Users/kld1/switchdrive/BFH/Wichtige Dokumente/Lehre/Data skills for social work professionals/BFH/Daten/Fokus Arbeit/")
focarb <- read.csv("FokusArbeit_Wirkung.csv")
#Select: select the variables Vitality1 (=measurement of vitality before the intervention), Vitality2 (=measurement of vitality before the intervention) and intervention (did the person participate in Fokus Arbeit or recieve standard counseling)
focarb <- focarb %>%
select(Vitality1,Vitality2,Interventionsgruppe)%>%
#Filter: out observations that have a missing value (NA = not available) on the measurement before the intervention.
filter(!is.na(Vitality1))%>%
#Mutate: calculate a new variable that measures the change in vitality before versus after
mutate(Change_Vitality=Vitality2-Vitality1)
Goal: find out the share of social workers among the working population in Europe and in Switzerland.
Read in data from the European Social Survey with the following steps:
Use the password to download the data
Relocate the working directory (using setwd()) to
your download folder or move the data to your working directory
Make sure you have the arrow package installed
(otherwise use install.packages("arrow")).
Read in the data with the read_parquet()
command.
Save it into the object ess.
Find out which variables measure the country of origin and the occupation of the respondent using the variable list. Hint 1: start searching from the top. Hint 2: Occuption is measured with the ISCO08 classification. For earlier years, it is the ISCO88 classification, but the data is reduced to years with the isco08 classification (after 2010). Use the variable list to find the exact variable names and labels.
Reduce the data frame to those two variables: country, ISCO08.
Filter out observations of individuals where information on ISCO08 is missing. The following values should be excluded:
| 66666 | Not applicable* |
| 77777 | Refusal* |
| 88888 | Don’t know* |
| 99999 | No answer* |
Create a new variable socialworker that measures
whether someone is a social worker or not (click on the variable to know
which numbers stand for social workers). Use the
ifelse(isco==NUMBERFORSOCIALWORKERS,"Yes","No") function to
define the variable.
Check the number of individuals (i.e. rows) in the total data
frame using nrow(data.frame). Store it into an object.
Filter out individuals that are social workers and check number of
individuals again. What is the proportion of social workers in Europe?
And in Switzerland?
library(arrow)
library(dplyr)
#Set working directory to your download folder
setwd("C:/Users/kld1/switchdrive/BFH/Wichtige Dokumente/Lehre/Data skills for social work professionals/BFH/Daten/European Social Survey/")
#https://drive.switch.ch/index.php/s/vSytsi7wl1H1yf3
#Load in data
ess <- read_parquet("ESS.parquet")
#Select variables with information on country and occupation
ess <- ess%>%
select(cntry,isco08)
#Filter out people with no information on job
ess <- ess%>%
filter(isco08<10000)
#Is someone a social worker
ess <- ess %>%
mutate(socialworker= ifelse(isco08 == 2635, "Yes","No"))
total<-nrow(ess)
socialworkers<-nrow(ess%>%filter(socialworker=="Yes"))
socialworkers/total
total.CH<-nrow(ess%>%filter(cntry=="CH"))
socialworkers.CH<-nrow(ess%>%filter(socialworker=="Yes"&cntry=="CH"))
socialworkers.CH/total.CH
R allows for easy combinations of multiple data frames
There are two ways to combine data:
Just glue them together using
rbind(dataframe1,dataframe2) and
cbind(dataframe1,dataframe2)
rbind: bind together rows when columns are the same
in the two data frames
cbind: bind together columns when rows are the same
in the two data frames
Row-wise linkage using an identifier variable with
merge(dataframe1,dataframe2, identifier variable).
library(arrow)
library(dplyr)
#data: https://drive.switch.ch/index.php/s/vSytsi7wl1H1yf3
setwd("C:/Users/kld1/switchdrive/BFH/Wichtige Dokumente/Lehre/Data skills for social work professionals/BFH/Daten/European Social Survey/")
#Load in data
ess <- read_parquet("ESS.parquet")
#Rbind
Swiss<-ess%>%
filter(cntry=="CH")
Germans<-ess%>%
filter(cntry=="DE")
Austrians<-ess%>%
filter(cntry=="AT")
DACH<-rbind(Swiss,Germans,Austrians)
#Cbind and merge
#Data frame 1
ess1 <- ess%>%
filter(essround==11)%>%
select(idno,isco08)
ess2 <- ess%>%
filter(essround==11)%>%
select(idno,happy)
ess3 <- cbind(ess1,ess2)
ess2b <-ess%>%
filter(essround==11)%>%
select(idno,happy)%>%
arrange(happy)
ess4 <- merge(ess1,ess2b,by="idno")
Read in the data sets from Jugendarbeit Burgdorf and Oberdorf (Data Oberdorf: Büro, Turnhalle; Data Burgdorf: see here).
Create a gesamttabelle_oberdorf including the two
data frames from Oberdorf and a gesamttabelle_burgdorf
including the five data frames from Burgdorf. Create and select three
variables: Datum (from 'Erfasst am' ), Anzahl,
and Gruppe (from 'Statistikfeld').
Check the data sets and bind them together with the correct command.
setwd("C:/Users/kld1/switchdrive/BFH/Wichtige Dokumente/Lehre/Data skills for social work professionals/BFH/Daten/Burgdorf/Turnhalle/")
library(readxl)
Buero <- read_excel("OJAOffice_Statistikdaten_Jugendbüro Oberburg 23.xlsx",sheet="Statistikdaten 2024")
Turnhalle<- read_excel("OJAOffice_Statistikdaten_offene Turnhalle 24.xlsx",sheet="Statistikdaten 2024")
gesamttabelle_oberdorf <- rbind(Buero,Turnhalle)%>%
select(Datum="Erfasst am",Anzahl, Gruppe=Statistikfeld)
setwd("C:/Users/kld1/switchdrive/BFH/Wichtige Dokumente/Lehre/Data skills for social work professionals/BFH/Daten/Burgdorf/Zusammenlegung/")
Maedels_22 <- read_excel("OJAOffice_Statistikdaten_Moditrff.xlsx")
Maedels_23 <- read_excel("Statistik OJA Angebote Burgdorf 2023.xlsx",sheet = "Moditraeff")
Jungs_22 <- read_excel("OJAOffice_Statistikdaten_Gieltrff.xlsx")
Jungs_23 <- read_excel("Statistik OJA Angebote Burgdorf 2023.xlsx",sheet = "Gielaetraeff",
range = "A1:B11",#Welche Daten aus dem Excel brauchen wir; Spalte A, Zeile 1 is Spalte B Zeile 11
col_names = FALSE)#die erste Zeile enthaelt nicht die Spaltennamen sondern Messwerte
JuBu_23 <- read_excel("Statistik OJA Angebote Burgdorf 2023.xlsx",sheet = "JuBU Traeff 5&6")
JuBu_24 <- read_excel("Copy of OJAOffice_Statistikdaten_Mittelstufentreff 24.xlsx",sheet = "Statistikdaten 2024")
gesamttabelle_burgdorf <- rbind(Maedels_22 %>%
select(Datum='Erfasst am', Anzahl, Gruppe=Statistikfeld),
Maedels_23 %>%
select(Datum='Erfasst am', Anzahl, Gruppe=Statistikfeld),
Jungs_22 %>%
select(Datum='Erfasst am', Anzahl, Gruppe=Statistikfeld),
Jungs_23 %>%
select(Datum=...1, Anzahl=...2)%>%
mutate(Gruppe="Jungs")%>%select(Datum, Anzahl, Gruppe),
JuBu_23 %>%
select(Datum='Erfasst am', Anzahl, Gruppe=Statistikfeld),
JuBu_24 %>%
select(Datum='Erfasst am', Anzahl, Gruppe=Statistikfeld))
The two most frequently used operations with text variables are
searching for text patterns within cells (grepl) and
replacing text-parts from a cell (gsub).
Regular expressions enable “wildcards” and other elements to define a search pattern.
A selection of metacharacters:
.: matches any character.[a-z]: lowercase a to z.[A-Z]: uppercase A to Z.[a-zA-Z]: lowercase a to z
and uppercase A to Z.[0-9]: digits.*: matches at least 0 times.+: matches at least 1 time.?: matches at most 1 time.{n}: matches exactly n times.{n,}: matches at least n times.{n,m}: matches between n and
m times.^: matches at the beginning of the string.$: matches at the end of the string.()Use ChatGPT for creating regular expressions. E.g. give me a regular expression that is true when the 3rd last character is a an “a”.
#https://drive.switch.ch/index.php/s/LoJjPR0XAVZWFYx
library(dplyr)
library(readxl)
setwd("C:/Users/kld1/switchdrive/BFH/Wichtige Dokumente/Lehre/Data skills for social work professionals/BFH/Daten/Burgdorf/Zusammenlegung/")
Maedels_22 <- read_excel("OJAOffice_Statistikdaten_Moditrff.xlsx")
Maedels_23 <- read_excel("Statistik OJA Angebote Burgdorf 2023.xlsx",sheet = "Moditraeff")
Jungs_22 <- read_excel("OJAOffice_Statistikdaten_Gieltrff.xlsx")
Jungs_23 <- read_excel("Statistik OJA Angebote Burgdorf 2023.xlsx",sheet = "Gielaetraeff",
range = "A1:B11",#Welche Daten aus dem Excel brauchen wir; Spalte A, Zeile 1 is Spalte B Zeile 11
col_names = FALSE)#die erste Zeile enthaelt nicht die Spaltennamen sondern Messwerte
JuBu_23 <- read_excel("Statistik OJA Angebote Burgdorf 2023.xlsx",sheet = "JuBU Traeff 5&6")
JuBu_24 <- read_excel("Copy of OJAOffice_Statistikdaten_Mittelstufentreff 24.xlsx",sheet = "Statistikdaten 2024")
gesamttabelle_burgdorf <- rbind(Maedels_22 %>%
select(Datum='Erfasst am', Anzahl, Gruppe=Statistikfeld),
Maedels_23 %>%
select(Datum='Erfasst am', Anzahl, Gruppe=Statistikfeld),
Jungs_22 %>%
select(Datum='Erfasst am', Anzahl, Gruppe=Statistikfeld),
Jungs_23 %>%
select(Datum=...1, Anzahl=...2)%>%
mutate(Gruppe="Jungs")%>%select(Datum, Anzahl, Gruppe),
JuBu_23 %>%
select(Datum='Erfasst am', Anzahl, Gruppe=Statistikfeld),
JuBu_24 %>%
select(Datum='Erfasst am', Anzahl, Gruppe=Statistikfeld))
gesamttabelle_burgdorf <-gesamttabelle_burgdorf%>%
mutate(Geschlecht=ifelse(grepl("Mädels",Gruppe, useBytes = TRUE),"Girls","Boys"))%>%#useBytes needed because of ä
mutate(Geschlecht2=gsub("[0-9]+-[0-9]+","",Statistikfeld))
ChatGPT can also be used to convert text into numeric data. Not only to extract physical information, but also to extract meaning between the lines.
R handles dates and times using the POSIX standard, widely used in computing.
Dates are stored as numeric values, representing the number of days since the “epoch” (January 1, 1970).
as.Date("1970-01-02")
equals 1, as it is one day after the epoch.When times are included, R uses seconds since the epoch.
as.POSIXct("1970-01-01 00:01:00") equals
60, representing 60 seconds after the epoch.This system allows for efficient calculations, such as finding time differences, sorting dates and times or extracting simpler representations of time.
To transform character strings into date or POSIX formats, use the following:
as.Date() with a specified format,
such as
as.Date("04.12.2024", format = "%d.%m.%Y")
for European-style dates.
as.POSIXct() for date-times, e.g.,
as.POSIXct("04-12-2024 14:30", format = "%d-%m-%Y %H:%M").
Format strings are key to parsing correctly:
%Y: 4-digit year,
%m: month,
%d: day.
%H: hour (24-hour format),
%M: minute,
%S: second.
To extract only parts of a date or time, use the
format() function.
For example, to reduce "2022-04-01 15:21:03" to
"2022-04-01", you can use:
datetime <- as.Date("2022-04-01")
format(datetime, "%Y-%m-%d")
format(datetime, "%Y-%m")
format(datetime, "%Y")as.POSIXct() /
as.Date() converts the character string into a
date-time/date object, and format()
extracts the desired components using format codes.
Load in the data of Oberdorf (Büro, Turnhalle) and bind them together.
Create a variable age_group from the variable
Statistikfeld. Use either gsub() or
grepl() to extract the relevant information.
Create a variable months_since_jan24 from the
variable 'Erfasst am' that indicates the number of months
between the current date and the introduction of the Turnhalle offer. Do
it with the following steps:
Define the reference date in as.POSIXct() format and
store it into an object (see example above).
Rename the variable 'Erfasst am' into a new one
called erfassungsdatum. It is possible to work with
variables that have a space, but unhandy (you always have to use back
ticks).
Calculate the difference with the following formula:
(as.numeric(format(erfassungsdatum, "%Y")) - as.numeric(format(reference_date, "%Y"))) * 12 +(as.numeric(format(erfassungsdatum, "%m")) - as.numeric(format(reference_date, "%m"))
#Download data to the folder by hand
#Büro: https://drive.switch.ch/index.php/s/gdNYHopxWDCV9hr
#Turnhalle: https://drive.switch.ch/index.php/s/am1T36ehPL24QuQ
#Load packages
library(readxl)
library(dplyr)
#Read in data
setwd("C:/Users/kld1/switchdrive/BFH/Wichtige Dokumente/Lehre/Data skills for social work professionals/BFH/Daten/Burgdorf/Turnhalle/")
Buero <- read_excel("OJAOffice_Statistikdaten_Jugendbüro Oberburg 23.xlsx",sheet="Statistikdaten 2024")
Turnhalle<- read_excel("OJAOffice_Statistikdaten_offene Turnhalle 24.xlsx",sheet="Statistikdaten 2024")
#Bind together
gesamttabelle <- rbind(Buero,Turnhalle)
#Create variable age_group
gesamttabelle <- gesamttabelle %>%
mutate(age_group = gsub("^(Jungs|Mädels) ", "", Statistikfeld))
#Create a variable months_since_jan24
# Reference date
reference_date <- as.Date("2024-01-01")
# Calculate months difference
gesamttabelle <- gesamttabelle %>%
mutate(erfassungsdatum=`Erfasst am`)%>%
mutate(
months_since_jan_2024 = (as.numeric(format(erfassungsdatum, "%Y")) -
as.numeric(format(reference_date, "%Y"))) * 12 +
(as.numeric(format(erfassungsdatum, "%m")) - as.numeric(format(reference_date, "%m"))
)
)
#Für die Gender Daten
library(dplyr)
library(readxl)
setwd("C:/Users/kld1/switchdrive/BFH/Wichtige Dokumente/Lehre/Data skills for social work professionals/BFH/Daten/Burgdorf/Zusammenlegung/")
Maedels_22 <- read_excel("OJAOffice_Statistikdaten_Moditrff.xlsx")
Maedels_23 <- read_excel("Statistik OJA Angebote Burgdorf 2023.xlsx",sheet = "Moditraeff")
Jungs_22 <- read_excel("OJAOffice_Statistikdaten_Gieltrff.xlsx")
Jungs_23 <- read_excel("Statistik OJA Angebote Burgdorf 2023.xlsx",sheet = "Gielaetraeff",
range = "A1:B11",#Welche Daten aus dem Excel brauchen wir; Spalte A, Zeile 1 is Spalte B Zeile 11
col_names = FALSE)#die erste Zeile enthaelt nicht die Spaltennamen sondern Messwerte
JuBu_23 <- read_excel("Statistik OJA Angebote Burgdorf 2023.xlsx",sheet = "JuBU Traeff 5&6")
JuBu_24 <- read_excel("Copy of OJAOffice_Statistikdaten_Mittelstufentreff 24.xlsx",sheet = "Statistikdaten 2024")
gesamttabelle_burgdorf <- rbind(Maedels_22 %>%
select(Datum='Erfasst am', Anzahl, Gruppe=Statistikfeld),
Maedels_23 %>%
select(Datum='Erfasst am', Anzahl, Gruppe=Statistikfeld),
Jungs_22 %>%
select(Datum='Erfasst am', Anzahl, Gruppe=Statistikfeld),
Jungs_23 %>%
select(Datum=...1, Anzahl=...2)%>%
mutate(Gruppe="Jungs")%>%select(Datum, Anzahl, Gruppe),
JuBu_23 %>%
select(Datum='Erfasst am', Anzahl, Gruppe=Statistikfeld),
JuBu_24 %>%
select(Datum='Erfasst am', Anzahl, Gruppe=Statistikfeld))
gesamttabelle_burgdorf <-gesamttabelle_burgdorf%>%
mutate(Geschlecht=ifelse(grepl("Mädels",Gruppe, useBytes = TRUE),"Girls","Boys"))%>%#useBytes needed because of ä
mutate(Geschlecht2=gsub("[0-9]+-[0-9]+","",Gruppe),
Alter=ifelse(grepl("11-15",Gruppe),"11-15",
ifelse(grepl("6-10",Gruppe),"6-10","16-20")),
Alter2=gsub("^(Jungs|Mädels) ","",Gruppe))
#Create a variable months_since_jan24
# Reference date
reference_date <- as.Date("2023-07-01")
# Calculate months difference
gesamttabelle <- gesamttabelle_burgdorf %>%
mutate(
months_since_jan_2024 = (as.numeric(format(Datum, "%Y")) -
as.numeric(format(reference_date, "%Y"))) * 12 +
(as.numeric(format(Datum, "%m")) - as.numeric(format(reference_date, "%m"))
)
)
count(x) to count how often each
value occurs in your variable. Such frequency tables are the most
universal strategy to summarize your data. It is applicable to all data
types (remember from introduction to
quantitative social research: nominal, ordinal, numeric).#With real data
library(arrow)
library(dplyr)
setwd("C:/Users/kld1/switchdrive/BFH/Wichtige Dokumente/Lehre/Data skills for social work professionals/BFH/Daten/European Social Survey/")
#data: https://drive.switch.ch/index.php/s/vSytsi7wl1H1yf3
#Load in data
ess <- read_parquet("ESS.parquet")
ess%>%
count(cntry)
#Add percentages
ess%>%
count(cntry)%>%
mutate(Percentages=n/sum(n)*100)#show round
Data can be summarized using the summarise()
function.
If numeric, single variables can be described with a single number, so called distributional parameters.
Distributional parameters allow to quantify the central tendency of a single variable
mean(x) to find the central
tendency, or simply the “average” value. The median - a sometimes better
measure for the central tendency of data - is calculated using
median(x).Distributional parameters allow to quantify the variability of a single variable
Use the minimum min(x) and maximum
max(x) to measure the range of data.
Use the standard deviation with
sd(x) to measure how much the data varies from the
mean.
library(arrow)
setwd("C:/Users/kld1/switchdrive/BFH/Wichtige Dokumente/Lehre/Data skills for social work professionals/BFH/Daten/European Social Survey/")
#data: https://drive.switch.ch/index.php/s/vSytsi7wl1H1yf3
#Load in data
ess <- read_parquet("ESS.parquet")
ess%>%summarise(Average_Happiness=mean(happy),
Minimum_Happiness=min(happy),
Maximum_Happiness=max(happy),
SD_Happiness=sd(happy))
Read in data of Fokus Arbeit.
Remove observations from the control group and missing values
(definition control group:
Interventionsgruppe == 0).
What percentage of participants are women? What percentage are long-term recipients (more than 12 months of receipt?
Calculate mean and standard deviation for both the measurements
before and after the intervention for vitality (Vitality),
social support (Sozial). Measurements before the
intervention end with a 1, measurements after the intervention end with
a 2. Bind the data together into one large data.frame. Hint for more
efficient code: ask ChatGPT to write the code with the
across function. This allows you perform the same functions
(pre-defined functions such as mean(), but also own
functions) on multiple variables.
Which aspects changed the most? Which distributional parameters changed most? Do you observe changes in variability of measurements?
library(dplyr)
#load data: #https://drive.switch.ch/index.php/s/eatQbxi4RjEFXKC
setwd("C:/Users/kld1/switchdrive/BFH/Wichtige Dokumente/Lehre/Data skills for social work professionals/BFH/Daten/Fokus Arbeit/")
focarb <- read.csv("FokusArbeit_Wirkung.csv")
focarb%>%count(Weiblich)
focarb%>%
mutate(Langzeit=Bezugsdauer>12)%>%#Definition of long-term recipients
count(Langzeit)%>%#Frequencies
mutate(Prop=round(n/sum(n)*100)) #Percentages
#By hand
table<-focarb%>%
filter(Interventionsgruppe==1, !is.na(Vitality1)) %>%
summarise(Average_Happiness=mean(Vitality1),
SD_Happiness=sd(Vitality1),
Average_Happiness2=mean(Vitality2),
SD_Happiness2=sd(Vitality2))
#More efficient
table<-focarb %>%
filter(Interventionsgruppe == 1, !is.na(Vitality1)) %>%
summarise(
across(
c(Vitality1, Vitality2, Sozial1, Sozial2),
list(
Average = mean,
SD = sd
),
.names = "{.col}_{.fn}"
)
)
Grouped means tables are a universal and straightforward method for analyzing associations. They display the mean value of the dependent variable by categories of the independent variable.
The dependent variable can either be continuous or have two categories coded as 0 and 1. In the latter case, it displays the proportion of the individuals with a 1 by category of the independent variable. Multiply times 100 and you have percentages.
The independent variable has at least two categories.
Grouped means tables are universally applicable to all types of data since all variable types can be coded into this format.
It is calculated using the
group_by(independent_variable) command before
summarise(name=mean(dependent_variable).
library(arrow)
library(dplyr)
#Set working directory to your download folder
setwd("C:/Users/kld1/switchdrive/BFH/Wichtige Dokumente/Lehre/Data skills for social work professionals/BFH/Daten/European Social Survey/")
#https://drive.switch.ch/index.php/s/vSytsi7wl1H1yf3
#Load in data
ess <- read_parquet("ESS.parquet")
#Select variables with information on country and occupation
ess <- ess%>%
select(cntry,isco08)
#Filter out people with no information on job
ess <- ess%>%
filter(isco08<10000)
#Is someone a social worker
ess <- ess %>%
mutate(socialworker= as.numeric(isco08 == 2635))
#Add on: What is the country with the highest share of social workers?
ranking<-ess%>%
group_by(cntry)%>%
summarise(sharesw=mean(socialworker==1))%>%
arrange(sharesw)
gesamttabelle and add a variable before
versus after January 2024.Anzahl data into one row per date. We
want to have data on the total number of visitors per day. Use
group_by(erfassungsdatum)%>%summarise(Totalvisits=sum(Anzahl).library(readxl)
library(dplyr)
#https://drive.switch.ch/index.php/s/gdNYHopxWDCV9hr
#https://drive.switch.ch/index.php/s/am1T36ehPL24QuQ
#Read in data
setwd("C:/Users/kld1/switchdrive/BFH/Wichtige Dokumente/Lehre/Data skills for social work professionals/BFH/Daten/Burgdorf/Turnhalle/")
Buero <- read_excel("OJAOffice_Statistikdaten_Jugendbüro Oberburg 23.xlsx",sheet="Statistikdaten 2024")
Turnhalle<- read_excel("OJAOffice_Statistikdaten_offene Turnhalle 24.xlsx",sheet="Statistikdaten 2024")
#Bind together
gesamttabelle <- rbind(Buero,Turnhalle)%>%
mutate(erfassungsdatum=`Erfasst am`,
after=erfassungsdatum>as.Date("2024-01-01"))%>%
group_by(erfassungsdatum)%>%
summarise(Totalvisits=sum(Anzahl))
#Grouped mean
gesamttabelle%>%
group_by(after)%>%
summarise(mean(Totalvisits))
There might be situations where we not only want to know whether there is a difference between two groups but were we
A. want to explore the reasons for differences or
B. want to see whether the differences hold while accounting for third variables.
Examples:
We want to know whether women earn less than men irrespective of the types of jobs they have.
We not only want to know whether social workers are happier with their lives compared to health care workers but also whether this is due to differences in the number of work hours.
In this case, linear regression or Ordinary Least Squares (usually called OLS) is the tool of choice. It can be used in the situation where the dependent variable is continuous or has two categories (e.g. 0 for “Not poor” and 1 for “Poor”), and for all types of independent variables.
The command looks the following:
lm(dependent_variable ~ independent_variable + third_variable, name_of_dataframe).
Usually you would store the output of the model calculations into
an object, e.g. like this
model<-lm(dependent_variable ~ independent_variable + third_variable, name_of_dataframe)
Then you get the most relevant results nicely presented using
huxreg(model) from the huxtable
package.
Relevant for the interpretation are the coefficients (number without brackets) and the test of the statistical significance (stars). You will get two types of coefficients:
Coefficients for continuous independent variables. They tell you how much the dependent variable increases for each unit increase of the independent variable.
Coefficients for categorical independent variables. They tell you the difference in the dependent variable between the category of interest (0 in case of a 0/1, so called dummy, variable) and the reference category (0 in case of a dummy variable).
As soon as you add third variables, these coefficients are “independent” of the levels of third variables. Comparing the coefficients in models without versus with third variables gives you an idea of the importance of these third variables.
#With real data
library(arrow)
library(dplyr)
library(huxtable)
setwd("C:/Users/kld1/switchdrive/BFH/Wichtige Dokumente/Lehre/Data skills for social work professionals/BFH/Daten/European Social Survey/")
#data: https://drive.switch.ch/index.php/s/vSytsi7wl1H1yf3
# Read the data back from the Parquet file
data <- read_parquet("ESS.parquet")
# Select the relevant columns
selected_data <- as.data.frame(data) %>% select(isco08,happy,wkhct,essround)
# Recode job satisfaction values as numeric and set specific values to NA
selected_data <- selected_data %>%
mutate(
happy = case_when(
as.numeric(happy) %in% c(66, 77, 88, 99) ~ NA_real_,
TRUE ~ as.numeric(happy)
),
wkhct = case_when(
as.numeric(wkhct) %in% c(66, 77, 88, 99) ~ NA_real_,
TRUE ~ as.numeric(wkhct)
)
)
# Define ISCO codes for occupations
social_worker_codes_08 <- c(2635)
health_care_worker_codes_08 <- c(2221, 2222, 3221)
# Create occupation column and filter rows with NA in occupation
selected_data <- selected_data %>%
mutate(
occupation = case_when(
isco08 %in% social_worker_codes_08 ~ "Social Worker",
isco08 %in% health_care_worker_codes_08 ~ "Health Care Workers",
TRUE ~ NA_character_
)
) %>%
filter(!is.na(occupation))
# Run the linear regression
model <- lm(happy ~ occupation, selected_data)
#Differences in work hours
selected_data%>%group_by(occupation)%>%summarize(mean(wkhct,na.rm=T))
model_controlled <- lm(happy ~ occupation + wkhct, selected_data)
huxreg(model,model_controlled)
Read in data of Fokus Arbeit.
Calculate a variable that measures the change in vitality
(e.g. changevit) from the time point before versus after
the intervention (formula: vitatlity2-vitality1).
Run a regression model with changevit as the
dependent variable and Interventionsgruppe as the
independent variable.
Run a second model include age, gender and duration of benefit
receipt. Do these factors influence whether vitality has changed between
the two measurements? Does accounting for these factors alter the
association between Interventionsgruppe and vitality? Do
differences in these dimensions explain why vitality has changed more in
the intervention group?
library(huxtable)
library(dplyr)
#load data: #https://drive.switch.ch/index.php/s/eatQbxi4RjEFXKC
setwd("C:/Users/kld1/switchdrive/BFH/Wichtige Dokumente/Lehre/Data skills for social work professionals/BFH/Daten/Fokus Arbeit/")
focarb <- read.csv("FokusArbeit_Wirkung.csv")
focarb <- focarb%>%
mutate(changevit=Vitality2-Vitality1)
model <- lm(changevit~Interventionsgruppe,focarb)
model2 <- lm(changevit~Interventionsgruppe+Alter+Weiblich+Bezugsdauer,focarb)
huxreg(model,model2)
The presentation of results is crucial for their impact.
R is very versatile in terms of data preparation and analysis but, unfortunately, raw console output from R is hard to read.
However, there are several powerful packages
(huxtable,ggplot2,officer,flextable) that help you create
nice tables and figures from within R.
Generally, you should try to keep table and graph creation within R. This takes time. However, it allows you to automate the data preparation and analysis process thus saving you a lot of time if your data changes (e.g. because of an update) or when you want the change the way you analyze your data. Specifically, you can save your tables or figures and link them into the document you are creating and automatically update them within your document when needed.
Generally, table and graph creation is nitty-gritty work requiring a lot of detail knowledge. So we won’t go into it further here. However, ChatGPT is very good at this. Use it extensively for these tasks.
The huxtable package allows you to control many
aspects of the aesthetic design of a table and to save it into a output
file in a handy format (such as Word or PDF). For example, you can tell
it which cells should have bold font and which rows and columns should
be underlined.
However, before you get to the point of changing the looks of the
table, the data must be in correct shape. The tidyr package
brings two important functions:
pivot_longer(): Transforms a
wide dataset (with multiple columns) into a
long dataset, where values from multiple columns are
combined into a single column, and another column is created to label
those values.
pivot_wider(): Transforms a
long dataset (with multiple rows for the same
observation) into a wide dataset, where unique values
in a column become new columns, and the data is spread across these
columns.
#load package
library(dplyr)
library(tidyr)
setwd("C:/Users/kld1/switchdrive/BFH/Wichtige Dokumente/Lehre/Data skills for social work professionals/BFH/Daten/Fokus Arbeit/")
focarb <- read.csv("FokusArbeit_Wirkung.csv")
table<-focarb %>%
filter(Interventionsgruppe == 1, !is.na(Vitality1)) %>%
summarise(
across(
c(Vitality1, Vitality2, Sozial1, Sozial2),
list(
Average = mean,
SD = sd
),
.names = "{.col}_{.fn}"
)
)%>%
pivot_longer(#Put it into nice format; this is why we need tidyr; unfortunately not so simple; but ChatGPT helps you well
cols = everything(),
names_to = c("Variable", "Statistic"),
names_sep = "_",
values_to = "Value"
) %>%
pivot_wider(
names_from = Statistic,
values_from = Value
)
# Create a huxtable
ht <- as_hux(table) %>%
set_bold(1, everywhere, TRUE) %>% # Bold the header
set_align(everywhere, 2:3, "center") %>% # Center align numeric columns
set_number_format(2:nrow(table) + 1, 2:3, "%.2f") %>% # Format numbers to 2 decimals
set_caption("Descriptive Statistics Table") %>% # Add a caption
set_width(0.7)
# Print the table
ht
#Save it
quick_docx(ht,"Table.docx")
A powerful R package for creating high-quality data
visualizations is ggplot2. Built around the “grammar of
graphics,” it allows you to construct plots step-by-step, starting with
data and mappings, and then adding layers, scales, labels, and
themes.
Grammar of Graphics Approach: Every plot is a combination of data, geometric objects (such as points or bars), and aesthetic mappings (like color or size), making it easy to customize and iterate.
Flexible Customization: You can adjust virtually any aspect of your visualization, including colors, shapes, line types, and backgrounds, ensuring that the final plot is both informative and visually appealing.
Data-Ready Visualizations: Like with tables, having data in the
correct format is crucial. By using tidyr functions (such as
pivot_longer() and pivot_wider()), you ensure
that your dataset is tidy before you start plotting, thereby simplifying
the creation of clear and insightful graphs.
Consistency and Reproducibility: Through a structured and layered approach, ggplot2 enables consistent styling and easy updates, making it well-suited for reproducible research—an important aspect for data analysis projects in Switzerland.
library(readxl)
library(dplyr)
library(ggplot2)
#Read in data
setwd("C:/Users/kld1/switchdrive/BFH/Wichtige Dokumente/Lehre/Data skills for social work professionals/BFH/Daten/Burgdorf/Turnhalle/")
Buero <- read_excel("OJAOffice_Statistikdaten_Jugendbüro Oberburg 23.xlsx",sheet="Statistikdaten 2024")
Turnhalle<- read_excel("OJAOffice_Statistikdaten_offene Turnhalle 24.xlsx",sheet="Statistikdaten 2024")
#Bind together and calculate average visitors by months difference to january 2024
plotdata <- rbind(Buero,Turnhalle)%>%
mutate(erfassungsdatum=`Erfasst am`,
months_since_jan_2024 = (as.numeric(format(erfassungsdatum, "%Y")) -
as.numeric(format(as.Date("2024-01-01"), "%Y"))) * 12 +
(as.numeric(format(erfassungsdatum, "%m")) - as.numeric(format(as.Date("2024-01-01"), "%m"))))%>%
group_by(months_since_jan_2024)%>%
summarise(totalvisits=sum(Anzahl))
#Plot data
ggplot(plotdata,aes(x=months_since_jan_2024,y=totalvisits))+
geom_point()
ggsave("Figure.pdf",height=7,width=8)
Use ESS data to calculate a variable whether someone is a social worker (see Solution: simple data manipulation with dplyr).
Calculate the percentage of social workers in each country in the data using grouped mean.
Make a bar chart with ggplot2 where bars are sorted
(put reorder(cntry,share) as the X-aesthetic). Use
+geom_bar(stat="identity") to create bars. Save
it.
Extra taks: create a graph for changes over time (hint:
essround measures the year of data collection; 6
=2011/2012). Are there more or less social workers?
library(arrow)
library(dplyr)
#Set working directory to your download folder
setwd("C:/Users/kld1/switchdrive/BFH/Wichtige Dokumente/Lehre/Data skills for social work professionals/BFH/Daten/European Social Survey/")
#https://drive.switch.ch/index.php/s/vSytsi7wl1H1yf3
#Load in data
ess <- read_parquet("ESS.parquet")
#Select variables with information on country and occupation
ess <- ess%>%
select(cntry,isco08,essround)
#Filter out people with no information on job
ess <- ess%>%
filter(isco08<10000)
#Is someone a social worker
ess <- ess %>%
mutate(socialworker= ifelse(isco08 == 2635, "Yes","No"))
#Plot
ess%>%
group_by(cntry)%>%
summarise(sharesw=100*mean(socialworker=="Yes"))%>%
ggplot(.,aes(x=reorder(cntry,-sharesw),sharesw))+
geom_bar(stat = "identity")+
labs(x="Land",y="Anteil Sozialarbeitende an der gesamten Arbeitsbevölkerung")
ess%>%
group_by(essround)%>%
summarise(sharesw=100*mean(socialworker=="Yes"))%>%
ggplot(.,aes(x=essround,sharesw))+
geom_bar(stat = "identity")+
labs(x="Jahr",y="Anteil Sozialarbeitende an der gesamten Arbeitsbevölkerung")